Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BlockSTM] Add latency counter for profiling BlockSTM #6956

Merged
merged 3 commits into from
Mar 8, 2023

Conversation

sitalkedia
Copy link
Contributor

@sitalkedia sitalkedia commented Mar 6, 2023

Description

Add some more instrumentation to block STM to understand performance bottlenecks. Also, fix the APTOS_EXECUTOR_EXECUTE_BLOCK_SECONDS to measure the latency of the entire function call.

Test Plan

Tested with Forge and executor-benchmark and ensures this doesn't introduce any regression.

@sitalkedia sitalkedia added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Mar 6, 2023
@sitalkedia sitalkedia requested a review from grao1991 March 6, 2023 22:26
@sitalkedia sitalkedia enabled auto-merge (squash) March 6, 2023 22:31
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> a491021dab57520dd0a86ee69682b4bc4f274c5c

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> a491021dab57520dd0a86ee69682b4bc4f274c5c (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7953 TPS, 4786 ms latency, 7300 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: a491021dab57520dd0a86ee69682b4bc4f274c5c
compatibility::simple-validator-upgrade::single-validator-upgrade : 5074 TPS, 8038 ms latency, 11100 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: a491021dab57520dd0a86ee69682b4bc4f274c5c
compatibility::simple-validator-upgrade::half-validator-upgrade : 4744 TPS, 8546 ms latency, 11600 ms p99 latency,no expired txns
4. upgrading second batch to new version: a491021dab57520dd0a86ee69682b4bc4f274c5c
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7144 TPS, 5392 ms latency, 8700 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> a491021dab57520dd0a86ee69682b4bc4f274c5c passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

✅ Forge suite framework_upgrade success on cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> a491021dab57520dd0a86ee69682b4bc4f274c5c

Compatibility test results for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> a491021dab57520dd0a86ee69682b4bc4f274c5c (PR)
Upgrade the nodes to version: a491021dab57520dd0a86ee69682b4bc4f274c5c
framework_upgrade::framework-upgrade::full-framework-upgrade : 6913 TPS, 5540 ms latency, 8300 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> a491021dab57520dd0a86ee69682b4bc4f274c5c passed
Test Ok

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

✅ Forge suite land_blocking success on a491021dab57520dd0a86ee69682b4bc4f274c5c

performance benchmark with full nodes : 5919 TPS, 6627 ms latency, 16300 ms p99 latency,(!) expired 2780 out of 2530320 txns
Test Ok

@sitalkedia sitalkedia changed the title [BlockSTM] Add latency counter for fetching next task [BlockSTM] Add latency counter for profiling BlockSTM Mar 7, 2023
Copy link
Contributor

@danielxiangzl danielxiangzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run the parallel execution only benchmark on this PR, there seems to be no performance degradation. Can you also try the execution benchmark to see the performance?

@sitalkedia
Copy link
Contributor Author

@danielxiangzl - Yes I ran both Forge and execution benchmark with and without this change and there is no performance degradation.

Copy link
Contributor

@gelash gelash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pub static PARALLEL_EXECUTION_SECONDS: Lazy<Histogram> = Lazy::new(|| {
register_histogram!(
// metric name
"aptos_parallel_execution_seconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it helps to have all these names start with "aptos_execution", in which case we could make this one aptos_execution_seconds and the next one aptos_execution_rayon_seconds or smt similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gelash - The PR got auto-merged, I will sneak the renaming changes in some other PR if that's okay.

// metric name
"aptos_execution_get_next_task_seconds",
// metric description
"The time spent in seconds for getting next task from the scheduler",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block-STM scheduler (or Block Executor scheduler, or Parallel Executor / execution scheduler)

// metric name
"aptos_execution_work_with_task_seconds",
// metric description
"The time spent in work task with scope call in Block STM",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like most descriptions say parallel execution (or we can use Block STM or parallel / block executor everywhere)

@sitalkedia sitalkedia merged commit 66c0a3e into main Mar 8, 2023
@sitalkedia sitalkedia deleted the block_stm_counters_1 branch March 8, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants